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nth great theoretical and practical significance, locating influential nodes of complex networks is a promising issues. In this paper, we propose 
dynamics-sensitive (DS) centrality that integrates topological features and dynamical properties. The DS centrality can be directly applied 
CSli locating influential spreaders. According to the empirical results on four real networks for both susceptible-infected-recovered (SIR) and 
su sceptible-infected (SI) spreading models, the DS centrality is much more accurate than degree, fc-shell index and eigenvector centrality. 

I—I 

^ ^reading dynamics represents many important processes in nature and society^, such as the propagation of computer virusand traffic conges- 
C^on“ the reaction diffusion® the spreading of infectious diseases®and the cascading failures®. The estimation of nodes’ spreading influences can help 
Iti hindering the epidemics or accelerating the innovation®, and similar methods can be further applied in identifying the influential spreader in social 
^_jjetworks®, quantifying the influence of scientists and their publications®, evaluating the impacts of injection points in the diffusion of microfinance®, 
^nding drug targets in directed pathway networks®, predicting essential proteins in protein interaction networks®, and so on. 

The significance of this issue triggers a variety of novel approaches in identifying influential spreaders in networks, which can be roughly categorized 
|~'jnto t hree c lasses. Firstly, some scientists argued that the location of a node is more important than its immediate neighbors, and thus proposed fc-shell 
k^fidex and its varian ts as indicators of spreading influences. Secondly, some scientists quantified a node’s influence only accounting for its 
\Oio cal surroundings hS ElJ T hirdly, some scientists evaluated nodes’ influences according to the steady states of some introduced dynamical processes, 
^^ch as random walk®® and iterative refinement®. 

The above-mentioned approaches only take into account the topolo gical features, while recent experiments indicate that the performance of struc- 
ly iral indices is very sensitive to the specific dynamics on networksl^aZZl p^j. example, when the spreading rate is very small, the degree usually performs 
l/ijetter than the eigenvector centrality® and fc-shell index®, while when the infectivity is very high, the eigenvector centrality is the best one among 
T—the three (see figures 1 and 2, with d etails shown later). To the best of our knowledge, there are few works taking into account the properties of the 
^nderlying spreading dynamics Via a Markov chain analysis, Klemm et al. ® suggested that the eigenvector centrality can be used in estimating 
• Inodes’ dynamical influences in the susceptible-infected-recovered (SIR) spreading model (also called susceptible-infected-removed model)®. Li et 
® provided complementary explanation of the suitability of eigenvector centrality based on perturbation around the equilibrium of the epidemic 
5-4ynamics and discussed the limitations of eigenvector centrality for homogeneous community networks. Both the above two works did not pay enough 
Attention to the specific parameters in the spreading models, and thus their suggested index only works well in a limited range of the parameter space. 
Bauer and Lizier® directly counted the number of possible infection walks with different lengths. Their method is very effective but less efficient due 
to the considerable computational cost, in addition, for the fundamental complexity in counting the number of paths connecting two nodes, their method 
can not be formulated in a compact analytical form. 

In this paper, we describe the infectious probabilities of nodes by a matrix differential function that accounts both topological features and dynamical 
properties. Accordingly, we propose a dynamics-sensitive (DS) centrality, which can be directly applied in quantifying the spre ading influences of nodes. 
According to the empirical results on four real networks, for both the SIR model® and the susceptible-infected (SI) model®® the DS centrality is 
more accurate than degree, fc-shell index and eigenvector centrality in locating influential nodes. The method proposed in this paper can be extended to 
other Markov processes on networks. 

1 Dynamics-Sensitive Centraiity 

An undirected network G = {V, E) with n = \V\ nodes and e = \E\ edges could be described by an adjacent matrix A = {oij} where Oij = 1 
if node i is connected with node j, and Oij = 0 otherwise. A is binary and symmetric with zeros along the main diagonal, and thus its eigenvalues are 
real and can be arrayed in a descending order as Ai>A 2 >.. .>A„. Since A is a symmetric and real-valued matrix, it can be factorized as A = QAQ^, 
where A = diag(Ai, A 2 ,. • ., An), Q = [qj^, qj,.. ., q^] and q. is the eigenvector corresponding to A;. 

Considering a spreading model where an infected node would infect its neighbors with spreading rate p and recover with recovering rate p. (see 
Materials and Methods for details). We denote x(f) (t > 0) as an n x 1 vector whose components are the probabilities of nodes to be ever infected no 
later than the time step t, and then x(t) — x(t — 1) (f > 1) is the probabilities of nodes to be infected at time step t. Notice that, x(t) is the cumulative 
probability that can be larger than 1, and we use the term probability just for simplicity. For example, if i is the only initially infected node, then 



a:i(0) = 1 and Xj:^i(0) = 0. In the first time step, x(l) = /3Ax(0), and for f > 1, we have (see the derivation in Materials and Methods) 

x(t) - x(t - 1 ) = I3A\PA + (1 - /r)I]*"^x( 0 ), ( 1 ) 

where I is an n x n unit matrix. Denoting H = PA + (1 — /i)I, then ^AH*“^x(0) represents the prohahilities of nodes to he infected at time step t, 
and thus the prohahilities of nodes to be ever infected no later than t can be rewritten as 

i t-1 

x{t) = ^[x(r) - x(r - 1)] + x(l) = ^ ^AH’"x(0). (2) 

r =2 r =0 

We define Spt) the spreading influence of node i at time step t, which can be quantified by the sum of infected probabilities of all nodes, given i 
the initially infected seed. According to Eq. (2), the infected probabilities can be written as 

t-i 

x(f) = ^/3AH’'ei, (3) 

r=0 

where d = (0, • • • , 0,1, 0, ■ • ■ , 0)^ is an n x 1 vector with only the ith element being 1. As all elements other than the ith one of ti are zero, x{t) 
is indeed the sum of all the ith columns of /3A, /3AH, • • • , Given x(0) = d, Si{t) is defined as the sum of all elements of x{t), which is 

equal to the sum of all elements in the ith columns of /3A, /3AH, • • • , /3AH*~^, as 

Si{t) = [(^A + ^AH + ...+^AH*-1)^l] , (4) 

where L = (1, 1, • • • , 1)^ is an n x 1 vector whose components are all 1. Obviously, A^ = A, = H and AH = HA, then the spreading 
influence of all nodes can be described by the vector 

t-i 

S(f) = -SAH^L. (5) 

r =0 

Notice that, X]r=o /3AH’'L = X]r=o /^AH’’ (XlILi iSAH’'ei, and X]r=o /^AH’'ei is the infected probabilities of all nodes given 

node i the only initially infected seed according to Eq. (2), so S{t) can also be roughly explained as the sum of infected probabilities over the n cases 
with every node being the infected seed once. This relationship shows an underlying symmetry, that is, in an undirected network, the node having 
higher influence is also the one apt to be infected. The readers are warned that such conclusion is not mathematically rigorous since we have ignore the 
complicated entanglement by allowing the elements of x{t) being larger than 1 . 

According to the Perro-Erobenius Theorem 1^, the eigenvectors of H is the same to the ones of A and pXi + 1 — /r is the ith eigenvalue of H, 
corresponding to q^. When /3Ai + 1 — fi < 1, i.e. Pj< 1/Ai (for the case fi 7 ^ 0), H*L could converge to null vector when t ^ 00 and S(f) could 
be written by the following way 

S(f) = pA{l - H)-1l = \{P/fi)A + {P/fifA^ + • • • + (/3/m)*A*]L. ( 6 ) 

For SIR model, without loss of generality, we set = 1, and then 

S(f) = (/3A + /32 a2 + ---+/3*A‘)L, (7) 

where (A^L); counts the total number of walks of length t from node i to all nodes in the network, weighted by /3* that decays as the increase of the 
length t. As S(t) quantifies nodes’ spreading influences, we call it dynamics-sensitive (DS) centrality, where the term dynamics-sensitive emphasizes 
the fact that S(f) is determined not only by the network structure (i.e.. A), but also the dynamical parameters (i.e., P and t). In particular, when f = 1, 
the initially infected node only has the chance to infect its neighbors and Si{l) = (/3AL)i with (AL); being exactly the degree of node i. When /i = 0 
(corresponding to the SI model) or /3 > 1/Ai, S(f) would be infinite when t —^ 00 , which could not reflect the spreading influences. In fact, as we 
allow the infected probability of a node to be cumulated and exceed 1, the DS centrality may considerably deviate from the real spreading influences at 
large P and large t. It is because that our theoretical deviation contains an underlying approximation that 1 — (1 — /?)"* « mP, where the left-hand side 
characterizes the real spreading process with any infected probability smaller than 1 and the right-hand side could exceed 1 when P and m are large. 
Since m denotes the number of contacts from infected nodes, the larger t will result in larger m before the end of epidemic spreading. Notice that, our 
main goal is to find out the ranking of spreading influences of nodes, namely to identify influential nodes, and thus we are still not aware of the impacts 
on the ranking from the above deviation because every node’s influence is over estimated. Fortunately, as we will show later, the DS centrality performs 
much better than other well-known indices for a very broad ranges of P and t that cover most practical scenarios. 

2 Results 

We test the performance of DS centrality in evaluating the nodes’ spreading influences according to the SIR model and SI model, with varying 
spreading rate p. Four real networks, including a scientific collaboration network, an email communication network, the Internet at the router level 
and a protein-protein interaction network, are used for the empirical analysis (see data description in Materials and Methods), and three well-known 
indices, including degree, fc-shell index and eigenvector centrality, are used as benchmark methods for comparison (see Materials and Methods for the 
definitions of those indices). Given the time step t, the spreading influence of an arbitrary node i is quantified by the number of infected nodes (for SI 
model) or the number of infected and recovered nodes (for SIR model) at t, where the spreading process starts with only node i being initially infected. 
Here we use Kendall’s Tau r^^to measure the correlation between nodes’ spreading influences and the considered centrality measure, where r is in the 
range [—1,1] and the larger r corresponds to the better performance (see Materials and Methods for the definition of r). 

As shown in Fig. 1, the Kendall’s Tau r for the DS centrality is between 0.968 and 0.995 for /3 G [0.01, 0.1], indicating that the ranking lists 
generated by the DS centrality and the real SIR spreading process are highly identical to each other. In comparison, the DS centrality performs much 
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Figure 1 | The accuracy of four centrality measures in evaluating nodes’ spreading influences according to the SIR model {fi = 1) in the four real networks, 
quantified by the Kendall’s Tau. The spreading rate 0 varies from 0.01 to 0.10, and the time step is set as t = 5. Eaoh data point is obtained by averaging over 
10^ independent runs. 

better than degree, fc-shell index and eigenvector centrality. As shown in Fig. 2, similar result is also observed for the SI model where the DS centrality 
performs much better than others. The results for larger /3 and t are respectively shown in Fig. SI and Fig. S2 of Supplementary Information, where 
the DS centrality still performs the best. 

Since A is a symmetric, real-valued matrix, the DS centrality S(f) can be written in the following way by decomposing A 


Si{t) = migii ^ qij + ^ m^gri E Qrj , 

j = l r = 2 j = l 

where rur = /?Ar[l — {0Xr -F 1 — /r)*](/r — for 1 < r < n. Rewriting Eq. (8) into 


, , n n n 

- = gii V qij + -Iri > 

mi ^ ^ mi ^ 

1 = 1 i -=2 1 = 1 


qrj ■ 


( 8 ) 


(9) 


With the increase of t and /3, ^ will converge to 0, and thus the ranking lists generated by S(f) will be identical to q^, which is exactly the same to the 
eigenvector centrality. This relationship is in accordance with the results presented in Fig. SI and Fig. S2, where the difference between the eigenvector 
centrality and DS centrality gets smaller as the increase of /3 and t. 

3 Discussion 


Estimating the spreading influences and then identifying the influential nodes are fundamental task before any regulation on the spreading process. 
For such task, most known works only took into account the topological information®. Recently, Aral and Walker^^ showed that the attributes of nodes 
are highly correlated with nodes’ influences and tendencies to be influenced. In this paper, in addition to the topological information, we get down to the 
underlying spreading dynamics and propose a dynamics-sensitive (DS) centrality, which is a kind of weighted sum of walks ending at the target node, 
where both the spreading rate and spreading time are accounted in the weighting function. The DS centrality can be directly applied in quantifying the 
spreading influences of nodes, which remarkably outperforms degree, fc-shell index and eigenvector centrality according to the empirical analyses of 
SIR model and SI model on four real networks. The DS centrality performs particular well in the early stage of spreading, which provides a powerful 
tool in early detection of potential super-spreaders for epidemic control. 

The DS centrality tells us an often ignored fact that the most influential nodes are dependent not only on the network topology but also on the 
spreading dynamics. Given different models and parameters, the relative influences of nodes are also different. Roughly speaking, if the spreading 
rate is small, we can focus on the close neighborhood of a node since it is not easy to form a long spreading pathway (i.e., 0*' decays very fast as the 
increase of t when /3 is small) while if the spreading rate is high, the global topology should be considered. A clear limitation of this work is that 
before calculating the DS centrality, we have to know the spreading rate that is usually a hidden parameter. This parameter can be effectively estimated 
according to the early spreading process and then we can calculate the DS centralities by varying the spreading rates over the estimated range and 
see which nodes are the most influential ones in average. 

Some other centralities related to specific dynamical processes have also been proposed recently, including routing centrality® epidemic central¬ 
ity ®, diffusion centrality ® percolation centrality ® and game centrality ® Comparing with these centralities, similar to the works by Klemm et 
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Figure 2 | The accuracy of four centrality measures in evaluating nodes’ spreading influences according to the SI model = 0) in the four real networks, 
quantified by the Kendall’s Tau. The spreading rate fi varies from 0.01 to 0.10, and the time step is set as t = 5. Each data point is obtained by averaging over 
10^ independent runs. 

al. this paper provides a more general framework that could in principle deal with all networked Markov processes and thus can be extended and 

applied in many other important dynamics, such as the Ising model^^, Boolean dynamics!^, voter model®, synchronization®, and so on. Furthermore, 
the DS centrality can also be directly extended to asymmetrical networks and weighted networks. We hope this work could highlight the significant 
role of underlying dynamics in quantifying the individual nodes’ importance, and then the difference between lists of critical nodes may give us novel 
insights into the hardly notices distinguishing properties of different dynamical processes. 

4 Materials and Methods 

Derivation of Eq. (1). The probabilities of nodes to be infected at time step t = 2 can be approximated as 

x(2) - x(l) = /3Ax(l) + /3A(1 - Ai)x(O) = f5A[PA + (1 - /r)I]x(0). (10) 

We assume that when t < p, x(p) — x(p — 1) = /3A[/3A + (1 — p)I]^“^x(0), then for f = p + 1, we have 

p-2 

x(p + 1) - x(p) = /3A{^(1 - fiy[x{p -r) -x{p-r- 1)] + (1 - p)^"^x(l) + (1 - /r)^x(O)} 

r=0 

p-2 

= /?A{;^(1 - pYPA[pA + (1 - + (1 - pf-ypA + (1 - p)ll}x(0) 

r=0 

P-3 (11) 

= ^A{^(1 - m)^^A[^A + (1 - p)ir^-^ + (1 - py-^lPA + (1 - p)ll"}x(0) 

r=0 

= pA{pA[^A + (1 - p)I]^-i + (1 - p)[pA + (1 - pW-^}x{0) 

= ^A[^A + (l-p)I]^x(0). 

Therefore, according to the mathematical induction, Eq. (1) is established. 

Spreading Model. Flere we apply the standard susceptible-infected-recovered (SIR) model (also called the susceptible-infected-removed model)®. In 
the SIR model, there are three kinds of individuals: (i) susceptible individuals that could be infected, (ii) infected individuals having been infected and 
being able to infect susceptible individuals, and (iii) recovered individuals that have been recovered and will never be infected again. In this paper, the 
spreading process starts with only one seed node being infected initially, and all other nodes are initially susceptible. At each time step, each infected 
node makes contact with its neighbors and each susceptible neighbor is infected with a probability /3. Then each infected node enters the recovered 
state with a probability p. Without loss of generality, we set p = 1. In the standard SI model, nodes can only be susceptible or infected, corresponding 
to the case with p — 0. 


A 










Table 1 | Basic statistical features of Erdos, Email, Router and Protein networks, including the number of nodes n, the number of the edges e, the average 
degree (fc) and the reciprocal of the largest eigenvalue l/Ai. 


Network 

n 

e 

(fc) 

1/Ai 

Erdos 

456 

1314 

5.763 

0.079 

Email 

1133 

5451 

9.622 

0.048 

Router 

2114 

6632 

6.274 

0.036 

Protein 

2783 

6007 

4.317 

0.063 


Benchmark Methods. The degree of an arbitrary node i is defined as the number of its neighbors, namely 

n 

ki — ^ ^ Clij , ( 12 ) 

where atj is the element of matrix A. Degree is widely applied for its simplicity and low computational cost, which works especially well in evaluating 
nodes’ spreading influences when the spreading rate is small. 

The main idea of eigenvector centrality is that a node’s importance is not only determined by itself, but also affected by its neighbors’ importance!^. 
Accordingly, eigenvector centrality of node i, Vi, is defined as 



i=i 


where A is a constant. Obviously, Eq. (13) can be written in a compact form as 


(13) 


Av = Av, (14) 

where v = {vi,V 2 , ■ ■ ■ , v„)'^. That is to say, v is the eigenvector of the^acent matrix A and A is the corresponding eigenvalue. Since the influences 
of nodes should be nonnegative, according to Perro-Frobenius Theorem^ v must be the largest eigenvector of A, say v = . 

Kitsak et al. argued that fc-shell index (i.e., coreness) is a better index than degree to locate the influential nodes. The fc-shell can be obtained 
by the so-called fc-core decomposition!^. The fc-core decomposition process is initiated by removing all nodes with degree fc = 1. This causes new 
nodes with degree fc < 1 to appear. These are also removed and the process is continued until all remaining nodes are of degree fc > 1. The removed 
nodes (together with associated links) form the 1-shell, and their fc-shell indices are all one. We next repeat this pruning process for the nodes of degree 
fc = 2 to extract the 2-shell, that is, in each step the nodes with degree fc < 2 are removed. We continue with the process until we have identified all 
higher-layer shells and all network nodes have been removed. Then each node i is assigned a fc-shell index a. 

Kendall’s Tau. For each node i, we denote yi as its spreading influence and Zi the target centrality measure (e.g., degree, fc-shell index, eigenvector 
centrality and DS centrality), the accuracy of the target centrality in evaluating nodes’ spreading influences can be quantified by the Kendall’s Tau!^, as 

2 _^ 

^ ^ n(n - 1) ~ ~ 

' i<j 

where sgn{y) is a piecewise function, when y > 0, sgn{y) = -1-1; j/ < 0, sgn(y) = —1; when y — 0, sgn(t/) = 0. r measures the correlation between 
two ranking lists, whose value is in the range [—1,1] and the larger r corresponds to the better performance. 

Data description. Four real networks are studied in this paper as follows, (i) Erdos, a scientific collaboration network, where nodes are scientists and 
edges represent the co-authorships. The data set can be freely downloaded from the web site http://wwwp.oakland.edu/enp/thedata/ (ii) Emafl!^, which 
is the email communication network of University Rovira i Virgili (URV) of Spain, involving faculty members, researchers, technicians, managers, 
administrators, and graduate students, (iii) Router!^, the Internet at the router level, where each node represents a router and an edge represents a 
connection between two routers, (iv) Protein!^, an initial version of a proteome-scale map of human binary protein-protein interaction. Basic statistical 
properties of the above four networks are presented in Table 1. 
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